Random Indexing using Statistical Weight Functions

نویسندگان

  • James Gorman
  • James R. Curran
چکیده

Random Indexing is a vector space technique that provides an efficient and scalable approximation to distributional similarity problems. We present experiments showing Random Indexing to be poor at handling large volumes of data and evaluate the use of weighting functions for improving the performance of Random Indexing. We find that Random Index is robust for small data sets, but performance degrades because of the influence of high frequency attributes in large data sets. The use of appropriate weight functions improves this significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Indexing Re-Hashed

This paper introduces a modified version of Random Indexing, a technique for dimensionality reduction based on random projections. We here describe how RI can be efficiently implemented using the notion of universal hashing. This eliminates the need to store any random vectors, replacing them instead with a small number of hash-functions, thereby dramatically reducing the memory footprint. We d...

متن کامل

Estimation of Variance Components for Body Weight of Moghani Sheep Using B-Spline Random Regression Models

The aim of the present study was the estimation of (co) variance components and genetic parameters for body weight of Moghani sheep, using random regression models based on B-Splines functions. The data set included 9165 body weight records from 60 to 360 days of age from 2811 Moghani sheep, collected between 1994 to 2013 from Jafar-Abad Animal Research and Breeding Institute, Ardabil province,...

متن کامل

Language Recognition using Random Indexing

Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we demonstrate its use for identifying the language of text samples, based on a novel method of encoding letter n-grams into high-dimensional Language Vectors. Further, we show that the method is easil...

متن کامل

Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections

The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in part...

متن کامل

توابع احتمالی حاکم بر نیروها و لنگرهای ناشی از امواج تصادفی دریا بر پایه قائم

Using the statistical characteristics is one of the methods to justify the random nature of the ocean waves. Probability function are used to facilitate the studies of the random waves parameters, such as the surface and height and period of the waves. Since, the force of the ocean waves are the prevalent principal forces on the offshore structures, the assignment of the significant structural ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006